implement RecHit SOA and move to new framework #322

VinInn · 2019-04-15T12:10:19Z

in this PR (on top of #312 and #318
TrackingRecHit is introduced as CUDAFormats
RecHit producer migrated to new framework #100
Clients migrated.
tbd: rename and clean

all three wf tested

also moved to constant memory in doublet builder following Hackaton investigation
(10% speed up in the kernel)

…atrices of static dimensions to run on the GPUs

…o use matrices of static dimensions in order to run on the GPUs.

- deleted the forgotten prints and time measurements; - created a new modifier for the broken line fit; - switched back from tipMax=1 to tipMax=0.1 (the change will maybe be done in another PR); - restored the original order of the cuts on chi2 and tip; - deleted the default label to pixelFitterByBrokenLine; - switched from CUDA_HOSTDEV to __host__ __device__; - BrokenLine.h now uses dinamically-sized-matrices (the advantage in using them over the statically-sized ones is that the code would also work with n>4) and, as before, the switch can be easily done at the start of the file; - hence, the test on GPUs now needs an increase in the stack size (at least 1761 bytes); - some doxygen comments in BrokenLine.h have been updated.

…line fit

VinInn · 2019-04-15T12:14:20Z

use
VinInn/cmssw@gpuSmartAllocDoublets...VinInn:gpuNewRecHits
to review the changes introduced by this PR

makortel

I need to take another look, but here is a first round of comments for the RecHit SOA+migration commits.

makortel · 2019-04-15T20:16:11Z

CUDADataFormats/TrackingRecHit/src/TrackingRecHit2DCUDA.cc

+
+
+  m_store16 = cs->make_device_unique<uint16_t[]>(nHits*n16,stream);
+  m_store32 = cs->make_device_unique<float[]>(nHits*n32+11+(1+TrackingRecHit2DSOAView::Hist::wsSize())/sizeof(float),stream);


The arrays are not necessarily 128-byte-aligned, right? (or do I miss something?)

right, I was thinking to introduce a stride function computing the required stride
((n*b+127)/128)*128/b and use it as stride(nHits,4) stride(nHIts,2) consistently.
it is also true that these arrays are accessed mostly random, so does not make much difference if aligned

makortel · 2019-04-15T20:17:26Z

CUDADataFormats/TrackingRecHit/src/TrackingRecHit2DCUDA.cc

+
+  view->m_charge = (int32_t *)get32(8);
+  view->m_xsize = (int16_t *)get16(2);
+  view->m_ysize = (int16_t *)get16(3);


Could reinterpret_cast be used here?

what difference it makes?

Aesthetics (also code rules)

makortel · 2019-04-15T20:32:28Z

RecoLocalTracker/SiPixelRecHits/interface/pixelCPEforGPU.h

@@ -9,6 +9,7 @@
 #include "DataFormats/GeometrySurface/interface/SOARotation.h"
 #include "Geometry/TrackerGeometryBuilder/interface/phase1PixelTopology.h"
 #include "HeterogeneousCore/CUDAUtilities/interface/cuda_cxx17.h"
+#include "HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h"


Is this include only for testing purposes, or really needed at the moment?

not strictly needed at the moment...

actually is needed due to the device functions....

So the header gets included in some CPU .cc file as well? Ok.

yes, there is the possibility of using this very CPE in the standard CPU wfs...

makortel · 2019-04-15T20:43:12Z

RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitFromSOA.cc

+  CUDAProduct<TrackingRecHit2DCUDA> const& inputDataWrapped = iEvent.get(tokenHit_);
+
+  // try to be in parallel with tracking
+  CUDAScopedContext ctx{iEvent.streamID(), std::move(waitingTaskHolder)};


After #305 this comment is not really accurate, and the context should be constructed as

Suggested change

CUDAScopedContext ctx{iEvent.streamID(), std::move(waitingTaskHolder)};

CUDAScopedContext ctx{inputDataWrapped, std::move(waitingTaskHolder)};

This work and the pixel tracking will be run in separate CUDA streams.

makortel · 2019-04-15T20:48:43Z

RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitHeterogeneous.cc

-                                                                   heterogeneous::GPUCuda,
-                                                                   heterogeneous::CPU
-                                                                   > > {
+class SiPixelRecHitHeterogeneous : public edm::global::EDProducer<> {


I'd suggest to (eventually) rename this class as SiPixelRecHitCUDA.

Indeed, in the plan

makortel · 2019-04-15T20:54:34Z

RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitHeterogeneous.cc


-    convertGPUtoCPU(iEvent.event(), hclusters, *output);
-  }
+  gpuAlgo_.makeHitsAsync(hits,digis, clusters, bs, fcpe->getGPUProductAsync(ctx.stream()), ctx.stream());


Alternatively makeHitsAsync() could construct and return TrackingRecHit2DCUDA.

originally I though to keep separate the CPU class with storage in the producer and have the algo only depending on the View. This did not work out as one needs more pointers on the host.
So yes, it is a possibility

makortel · 2019-04-15T20:55:11Z

RecoLocalTracker/SiPixelRecHits/plugins/gpuPixelRecHits.h

+                          int16_t * iph = &hits.iphi(0);
+                          float * xl = &hits.xLocal(0); float * yl = &hits.yLocal(0);
+                          float * xe = &hits.xerrLocal(0); float * ye = &hits.yerrLocal(0);
+                          int16_t * xs = &hits.clusterSizeX(0); int16_t * ys = &hits.clusterSizeY(0);


Indentation is off.

yeah, was just copied from the signature.
will fix and beatify

makortel · 2019-04-15T20:59:23Z

RecoLocalTracker/SiPixelRecHits/python/SiPixelRecHits_cfi.py

+        siPixelRecHitsLegacyPreSplitting = cms.VPSet(
+            cms.PSet(type = cms.string("SiPixelRecHitedmNewDetSetVector"))
+        )
+    )


IIUC, in RecHit case there is no need for an alias as SiPixelRecHitFromSOA and legacy SiPixelRecHitConverter produce the same products, so this could be simply

cuda = _siPixelRecHitFromSOA.clone()

(after moving the corresponding import above this line)

Ok, thanks. I just tried to make it working waiting for your explanation of the meaning of all that

makortel · 2019-04-15T21:02:02Z

RecoLocalTracker/SiPixelRecHits/src/PixelCPEFast.cc

@@ -68,13 +70,19 @@ PixelCPEFast::PixelCPEFast(edm::ParameterSet const & conf,

 const pixelCPEforGPU::ParamsOnGPU *PixelCPEFast::getGPUProductAsync(cuda::stream_t<>& cudaStream) const {
  const auto& data = gpuData_.dataForCurrentDeviceAsync(cudaStream, [this](GPUData& data, cuda::stream_t<>& stream) {
+
+      std::cout << "coping pixelCPEforGPU" << std::endl;
+      //here or above???


If you want it to print out when the transfer is initiated, this is the correct place. ("above" outside of the lambda would print every time the product is asked from)

yes, indeed. I wanted just to make sure it was really done once
(I was planning to fill here some constant stuff, but does not work across libraries)

fwyzard · 2019-04-30T08:50:06Z

This PR should be superseded by #324 / #329, apart from #324 (comment) .

rovere and others added 30 commits September 3, 2018 12:26

Add back standalone GPU fit test

4973ac7

First Broken Line import

5e21c6b

Merged PatatrackHackathon4 from repository rovere with cms-merge-topic

56ab682

BrokenLineGPU: work in progress

d447fbb

Full implementation of the broken line fit. For the moment it needs m…

55a86bc

…atrices of static dimensions to run on the GPUs

Full implementation of the broken line fit. For the moment it needs t…

d864079

…o use matrices of static dimensions in order to run on the GPUs.

Just forgot to add the files for the broken line modifier.

7557d8d

DO NOT MERGE - reuse the 10824.7 and .9 workflows to test the broken …

9b81f01

…line fit

Replaced by the autogenerated cfi file

174a6cc

Merge branch 'CMSSW_10_2_X_Patatrack' into broken_line_fit

ce31f94

Merge branch 'CMSSW_10_2_X_Patatrack' into broken_line_fit

e8faf5d

fix conflicts

a25628f

Merged broken_line_fit from repository VinInn with cms-merge-topic

fefb9cb

fix matrix init

7f731ad

make it working

e8207e6

make it working

a3950ea

tests runs

8578b61

some previous clenup

36550f2

change name of fit driver

790ccfa

BL driver ok

11482bf

trivial if

91eeafd

make BL default

7f0f80c

Merged broken_line_fit from repository VinInn with cms-merge-topic

92b33eb

allow dump of hits

2013a29

Merged OptFitting105 from repository VinInn with cms-merge-topic

cbc4064

inversion by Cholesky Decomposition

cc68930

use cholesky invert, enable 3,4,5 hit fit

9dbd97c

Merged OptFitting105 from repository VinInn with cms-merge-topic

ce8f8e1

remove sqrt from choleshy

ad060c9

VinInn added 5 commits April 15, 2019 10:16

try to fix cff

9b9fdbf

gpu wf works

5125a40

works also gpu2cpu

b4d2ee2

TP migrated as well

efab658

move from local to constant memory, faster

0abd9ae

VinInn requested a review from makortel April 15, 2019 12:17

makortel reviewed Apr 15, 2019

View reviewed changes

VinInn added 3 commits April 16, 2019 10:11

factorize algo from wrapper

1c42196

addressing Matti's comments

03865a8

speed it up by 25% (regression)

b235519

fwyzard added the Pixels Pixels-related developments label Apr 16, 2019

VinInn added 3 commits April 16, 2019 12:00

remove cout

5485462

add eigen SOA

a7cd789

fix includes

9c4223a

fwyzard mentioned this pull request Apr 16, 2019

Make use of Cache allocator in "CA" #312

Closed

VinInn added 8 commits April 19, 2019 10:40

fix cub workspace mess

cda7791

fix hits in module overflow

3a7443b

more debug, has verbose assert

92208ed

remove debug from danek code

7f10ed0

not a solution

2480178

sync stream, not a solution

7ce9f2f

clean

e090dba

not a solution, crashes multijob

9359f0b

VinInn mentioned this pull request Apr 22, 2019

Move BeamSpot transfer to GPU to its own producer #318

Merged

VinInn added 2 commits April 22, 2019 18:40

as for Matti's comment

aa7bf0e

clean

eb74217

fwyzard mentioned this pull request Apr 23, 2019

factorise and review the code changes used for CTD19 #309

Closed

15 tasks

fwyzard closed this Apr 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement RecHit SOA and move to new framework #322

implement RecHit SOA and move to new framework #322

VinInn commented Apr 15, 2019 •

edited

Loading

VinInn commented Apr 15, 2019

makortel left a comment

makortel Apr 15, 2019

VinInn Apr 16, 2019

makortel Apr 16, 2019

makortel Apr 15, 2019

VinInn Apr 16, 2019

makortel Apr 16, 2019

makortel Apr 15, 2019

VinInn Apr 16, 2019

VinInn Apr 16, 2019

makortel Apr 16, 2019

VinInn Apr 17, 2019

makortel Apr 15, 2019

VinInn Apr 16, 2019

makortel Apr 15, 2019

VinInn Apr 16, 2019

makortel Apr 15, 2019

VinInn Apr 16, 2019

makortel Apr 15, 2019

VinInn Apr 16, 2019

makortel Apr 15, 2019

VinInn Apr 16, 2019

makortel Apr 15, 2019

VinInn Apr 16, 2019

fwyzard commented Apr 30, 2019



		m_store16 = cs->make_device_unique<uint16_t[]>(nHits*n16,stream);
		m_store32 = cs->make_device_unique<float[]>(nHits*n32+11+(1+TrackingRecHit2DSOAView::Hist::wsSize())/sizeof(float),stream);

	CUDAScopedContext ctx{iEvent.streamID(), std::move(waitingTaskHolder)};
	CUDAScopedContext ctx{inputDataWrapped, std::move(waitingTaskHolder)};

implement RecHit SOA and move to new framework #322

implement RecHit SOA and move to new framework #322

Conversation

VinInn commented Apr 15, 2019 • edited Loading

VinInn commented Apr 15, 2019

makortel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard commented Apr 30, 2019

VinInn commented Apr 15, 2019 •

edited

Loading